Checkpointing algorithms and fault prediction
نویسندگان
چکیده
منابع مشابه
Checkpointing algorithms and fault prediction
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide an optimal algorithm to decide when to take predictions into account, and we derive the optimal value of the checkpoin...
متن کاملImpact of fault prediction on checkpointing strategies
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical analysis of Young in the presence of a fault prediction system, which is characterized by its recall and its precision, and which provides either exact or windowbased time predictions. We succeed in deriving the optimal value of the checkpointing period (thereby minimizing the wa...
متن کاملFault Tolerance and Checkpointing
Research and applications of clusters of workstations are growing rapidly. One of the major area is fault tolerance. This report describes two issues concerned: correctness and performance. After a number of techniques to improve performance are described, new research directions, diskless checkpointing and Java checkpointing, are introduced.
متن کاملCheckpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs: A Review
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs: A Review Sushant Patial Department of Computer Science, Himachal Pradesh University Shimla-5 Email: patialsushant @gmail.com Jawahar Thakur Department of Computer Science, Himachal Pradesh University Shimla-5 Email: jawahar.hpu @gmail.com -------------------------------------------------------------------ABSTRACT-----...
متن کاملFault-tolerant finite-element multigrid algorithms with hierarchically compressed asynchronous checkpointing
We examine novel fault tolerance schemes for data loss in multigrid solvers which essentially combine ideas of checkpoint-restart with algorithm-based fault tolerance. To improve efficiency compared to conventional global checkpointing, we exploit the inherent data compression of the multigrid hierarchy, and relax the synchronicity requirement through a local failure local recovery approach. We...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Parallel and Distributed Computing
سال: 2014
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2013.10.010